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Abstract —The use of small cell deployments in heterogeneous 
network (HetNet) environments is expected to he a key feature 
of 4G networks and heyond, and essential for providing higher 
user throughput and cell-edge coverage. However, due to different 
coverage sizes of macro and pico base stations (BSs), such a 
paradigm shift introduces additional requirements and challenges 
in dense networks. Among these challenges is the handover 
performance of user equipment (UEs), which will he impacted 
especially when high velocity UEs traverse picocells. In this paper, 
we propose a coordination-hased and context-aware mobility 
management (MM) procedure for small cell networks using 
tools from reinforcement learning. Here, macro and pico BSs 
jointly learn their long-term traffic loads and optimal cell range 
expansion, and schedule their UEs based on their velocities and 
historical rates (exchanged among tiers). The proposed approach 
is shown to not only outperform the classical MM in terms of 
UE throughput, but also to enable better fairness. In average, 
a gain of up to 80% is achieved for UE throughput, while the 
handover failure probability is reduced up to a factor of three 
by the proposed learning based MM approaches. 

Index Terms —Cell range expansion, HetNets, load balancing, 
mobility management, reinforcement learning, context-aware 
scheduling. 


I. Introduction 

The deployment of Long Term Evolution (LTE) heteroge¬ 
neous networks (HetNets) is a promising approach to meet the 
ever-increasing wireless broadband capacity challenge ifTI. B. 
However, deploying HetNets entails a number of challenges 
in terms of capacity, coverage, mobility management (MM), 
and mobility load balancing (MLB) across multiple network 
tiers 0. Mobility management is essential to ensure a con¬ 
tinuous connectivity to mobile user equipment (UEs) while 
maintaining quality of service (QoS). 

The mobility framework for LTE was originally developed 
and analyzed by the 3'^‘* generation partnership project (3GPP) 
for macro-only networks, and was therefore not explicitly 
optimized for HetNets. In LTE Rel. 11, mobility enhance¬ 
ments in HetNets have been investigated through a dedicated 
study item 0. 3GPP has defined key performance indicators 
(KPIs) for mobility measurements, i.e., the handover failure 
(HOE) due to a degraded signal-to-interference-plus-noise- 
ratio (SINR), the radio link failure (RLE), as well as the 
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probability of unnecessary handovers, typically referred to as 
ping-pong (PP) events. 

Poor MM approaches may increase the HOEs, RLEs, and 
PPs, and result in unbalanced load among cells. This entails 
a low resource utilization efficiency and hence deterioration 
of the user experience. In order to solve this problem, while 
minimizing PPs, mobility parameters in each cell need to be 
carefully and dynamically optimized according to cell traffic 
loads. It is essential to optimize handover parameters such 
as time to trigger (TTT), range expansion bias (REB), and 
hysteresis margin in order to answer the question: "‘when to 
handover which UE to which cell?” 

Mobility management techniques for HetNets have been 
recently investigated in the literature, e.g., in 0-0. In 0, the 
authors evaluate the effect of different combinations of MM 
parameter settings for HetNets. The main result is that mobility 
performance strongly depends on the cell size and UE speed. 
The simulations in 0 consider that all UEs have the same 
velocity in each simulation setup. In 0, the authors evaluate 
the mobility performance of HetNets considering almost blank 
subframes in the presence of cell range expansion and propose 
a mobility based intercell interference coordination (ICIC) 
scheme. Hereby, picocells configure coordinated resources by 
muting certain subframes so that macrocells can schedule their 
high velocity UEs in these resources without co-channel inter¬ 
ference from picocells. However, the proposed approach only 
considers three broad classes of UE velocities: low, medium, 
and high. Moreover, no adaptation of the REB has been taken 
into account. In ||6l, a handover-aware ICIC approach based 
on reinforcement learning is proposed. Hereby, the authors 
model the ICIC approach as a sub-band selection problem for 
mobility robustness optimization in a small cell only network. 
In 0, the cell selection problem in HetNets is formulated as 
a network wide proportional fairness optimization problem by 
jointly considering the long-term channel condition and load 
balance in a HetNet. While the proposed method enhances 
the cell-edge UE performance, no results related to mobility 
parameters are presented. 

To the best of our knowledge there is no previous work 
related to learning based mobility management in HetNets 
by jointly considering load balancing and UE scheduling. 
In this paper, we propose a joint MM and context-aware 
UE scheduling approach by using tools from reinforcement 
learning. Hereby, each base station (BS) individually optimizes 
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a) 


Classical MM: 

W/o picocell coverage 



Macro/pico coordination: 

Average rate of UE is 
exchanged in case of handover 


Learning based MM: 

Macro-Zpicocell 
coverage optimization 



Fig. 1: a) Classical MM framework w/o picocell coverage 
optimization, b) Proposed learning based MM framework con¬ 
sidering velocity and history (average rate) based scheduling. 


its own strategy (REB, UE scheduling) based on limited 
coordination among tiers. Both macro- and picocells learn 
how to optimize their traffic load in the long-term and the 
UE association process in the short-term by performing history 
and velocity based scheduling. We propose multi armed bandit 
(MAB) and satisfaction based MM learning approaches aiming 
at improving the overall system performance and reducing the 
HOE and PP probabilities. 

To illustrate the differences between the classical MM 
and our proposed approach, we depict in Pig. [1] a) and 
Fig.ffl b) the basic idea of the classical MM and proposed 
MM approaches, respectively. In the classical MM approach, 
there is no information exchange among tiers in case of UE 
handover and traffic offloading might be achieved by picocell 
range expansion. In the proposed MM approaches, instead, 
each cell individually optimizes its own MM strategy based 
on limited coordination among tiers. The major difference 
between MAB and satisfaction based learning is that MAB 
aims at maximizing the overall capacity while satisfaction 
based learning aims at satisfying the network in terms of 
capacity. In both cases, macro and pico BSs learn on the 
long-term how to optimize their REB, which results in load¬ 
balancing. On the short-term, based on these optimized REB 
values, each cell carries out user scheduling by considering 
each UE’s velocity and average rate, through coordinated effort 
among the tiers. Our contributions are as follows: 

• In the proposed MM approaches, we focus on both 
short-term and long-term solutions. In the long-term, a 
traffic load balancing procedure in a HetNet scenario 
is proposed, while in the short-term the UE association 
process is solved. 

• To implement the long-term load balancing method, we 
propose two learning based MM approaches by using 
reinforcement learning techniques: a MAB based and a 
satisfaction based MM approach. 


« The short-term UE association process is based on a 
proposed context-aware scheduler considering a UE’s 
throughput history and velocity to enable fair scheduling 
and enhanced cell association. 

The rest of the paper is organized as follows. Section II 
describes the system model, the problem formulation for 
MM, and the context-aware scheduler. In Section III, we 
introduce the learning based MM approaches. Section IV 
presents system level simulation results, and Anally, Section V 
concludes the paper. 


II. System Model 


We focus on the downlink transmission of a 2-layer HetNet, 
where layer 1 is modeled as macrocells and layer 2 as 
picocells. The HetNet consists of a set of BSs /C = {1,... ,K} 
with a set M = M} of macrocells underlaid by a 

set V = P} of picocells, where 1C = Ai Li V. Macro 

BSs are dropped following a hexagonal layout including three 
sectors. Within each macro sector m, p G V picocells are 
randomly positioned, and a set 14 = {1,..., (7} of UEs 
which are randomly dropped within a circle around each 
picocell p (hotspot). The UEs associated to macrocells are 
referred as macro UEs U(rn) = {l(m),..., C/(m)} G U 
and the UEs served by picocells are referred as pico UEs 
U{p) = {l{p),... ,U{p)} G U, where U{p) ^ U{m). Each 
UE i{k) with k G {m,p} has a randomly selected velocity 
G F km/h and a random direction of movement within 
an angle of [0;27r]. A co-channel deployment is considered, 
in which picocells and macrocells operate in a system with 
a bandwidth B consisting of r = {1,..., i?} resource blocks 
(RBs). At every time instant tn = nTg with n = [1,..., A^] 
and Ts = 1 ms, each BS k decides how to expand its 
coverage area by learning its REB Pk = {/3m,/3pl with 
/?„ = {0; 3; 6} dB and /3p = {0; 3; 6; 9; 12; 15; 18} d^ Both 
macro and pico BSs select their REB to decide which UE 
iik) to schedule on which RB based on the UE’s context 
parameters. These context parameters are defined as the UE’s 
velocity Ui(fe), its instantaneous rate when associated 

to BS k and its average rate defined as 4>i(^k)i'tn) = 

T 4>i{k) (tn), whereby T = NTs is a time window. The 

instantaneous rate 4>i(k){tn) is given by: 

(tn) = Bi(k) • log (l + 7i(fe) [tn)) , (1) 


with 7i(fc)(fri) being the SINR of UE i{k) at time which 
is defined as: 


4i{k) (^n) 


Pk ■ 5i(fc),fc(/n) 

yZ Pj ' 9i{k),j{tn) + 
NK. 
jAk 


( 2 ) 


with Pk being the transmit power of BS k, and gi(k),k{tn) 
being the channel gain from cell k to UE i{k) associated to 
BS k. The bandwidth B^^k) in equation ([1]) is the bandwidth 
which is allocated to UE i{k) by BS k at time 


*We consider lower REB values for macro BSs to avoid overloaded 
macrocells due to their large transmission power. 
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A. Handover Procedure 

According to the 3GPP standard, the handover mechanism 
is based on RSRP measurements, the filtering of measured 
RSRP samples. Handover Hysteresis Margin, and TTT mech¬ 
anisms a. A handover is executed if the target cell’s (biased) 
RSRP (plus hysteresis margin) is larger than the source cell’s 
(biased) RSRP. In summary, the handover condition for a UE 
i{k) to BS k is defined as: 

Pi{i{l)) + Pi < Pk{i{k)) + Pk -l-TOhist, (3) 

with {l,k} G 1C, mhist is the UE- or cell-specific 
hysteresis margin, Pk{Pi) is the REB of BS k(l), and 
Pk{i{k)) (or Pi{i{l))) [dBm] is the *(fc)-th ( or i(/)-th) UE’s 
RSRP from BS k{l) after TTT. 

B. Problem Formulation 

Our optimization approach aims at maximizing the total 
rate of the network. Hereby, we consider long-term and short¬ 
term processes. The long-term load balancing optimization 
approach is solved by the proposed learning based MM 
approaches presented in Section IIII-AI and Section IIII-BI 
which result in REB Pk value optimization and in load 
balancing (j)k,tot{tn)- Based on the estimated instantaneous 
load, the context-aware scheduler selects, in the short-term, 
for each RB a UE by considering its history and velocity 
as described in Section III-CI This results in each UE’s 
instantaneous rate 4>i(k)itn) and the RB allocation vector 
OLi(k){tn) = [cti(k),i,-,OLi(k),R] containing binary variables 
cti[k),r, and indicating whether UE i{k) of BS k is allocated 
at RB r or not. At each time instant each BS k performs 
the following optimization: 


where sorti„in(.„^j^^j) sorts the candidate UEs according to their 
velocity starting with the slowest UE, i.e. if more than one UE 
can be selected for RB r, the UE with minimum velocity is 
selected. The rationale behind introducing the sorting/ranking 
function for candidate UEs according to their velocity is that 
high-velocity UEs will not be favored over slow moving UEs. 

A scheduler according to (|9]) will allocate many (or even all) 
resources to a newly handed over UE since its average rate 
4>i{tn) in the target cell is zero, i.e. in the classical Proportional 
Pair scheduler, piitn) = 4’i{k){'^n) = 0 when a UE is handed 
over to cell k, whereas we redefine it according to (ITOl) . To 
avoid this and enable a fair resource allocation among all UEs 
in a cell, we propose a history based scheduling approach. We 
define the average rate pptn) according to ( [TOb incorporating 
the following idea: Via the X2-interface macro- and picocells 
coordinate, so that once a macro UE i{m) is handed over to 
picocell p its rate history at time instant tn is provided to 
picocell p in terms of average rate such that the 

UE’s (which is named as i{p) after the handover) average rate 
at picocell p becomes: 


^i{p) 


T ■ 4>i{m) i'tn) + Pi{p) {tn + Ts) 
T + Ts 


( 10 ) 


In ([Tol l, a moving average rate is considered from macrocell 
to picocell, whereas in the classical MM approaches a UE’s 
history is not considered and is equal to zero. In other words, 
the proposed MM approach considers the historical rate when 
UE i{m) was associated to the macrocell m in the past. 


III. Learning Based Mobility Management 
Algorithm 


N R 


max 

at(fc)(ir.) 

h 


E 


E 

i{k)^Uk 


r—1 


subject to: 

E ^i(k),r = ^ Vr,Vk, 

i(k)eUk 


Pk < pT" 

Pi{k){tn) P Pk,n 


(4) 


(5) 

( 6 ) 

(7) 

( 8 ) 


where 4>i{k),r{tn) is the instantaneous rate of UE i{k) at RB 
r. The condition in (7) implies that the total transmitted power 
over all RBs does not exceed the maximum transmission 
power p™” of BS k. 


C. Context-Aware Scheduler 

The proposed MM approach does not only optimize the 
load according to Section III-BI but considers also context- 
aware and fairness based UE scheduling. At each RB r, a UE 
i{k) is selected to be served by BS k on RB r according to 
the following scheduling criterion: 


l{k)r* 


sort 

min(ui(fc)) 


( arg max 

i{k)GUk 


4^i{k),ri^n) N 

Mtn) ) 


(9) 


To solve the optimization approach defined in Section 
III-Bl we rely on the self organizing capabilities of HetNets 
and propose an autonomous solution for load balancing by 
using tools from reinforcement learning H. Hereby, each 
cell develops its own MM strategy to perform optimal load 
balancing based on the proposed learning based approaches 
presented in Section IIII-AI and Section IIII-BI To realize this, 
we consider the game g = {1C, {AjfceK, {ukjkeic}- Hereby, 
the set /C = {MCV} represents the set of players (i.e., BSs), 
and for all k G 1C, the set Ak = {Pk} represents the set 
of actions player k can adopt. Por all k G 1C, the function 
Uk{tn) is the utility function of player k. The players learn 
at each time instant to optimize the load in long-term and 
to perform context aware scheduling in short-term based on 
the algorithms presented in Section IIII-AI and IIII-BI by the 
following steps: 

1) Action Ok G Ak is selected based on the obtained utility 
Uk{tn) = PkMitn) with 4>k,tot{tn) being the total rate 
of player k at time as defined in equation ([TTi i. 

2) The action selection strategy is updated based on the 
selected learning algorithm presented in Section IIII-AI 
and Section Illl-Bl 

3) UE of BS k is allocated at RB r based on its velocity, 
its instantaneous rate, and its average rate according to 

®. 
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A. Multi-Armed Bandit Based Learning Approach 

The objective of the MAB approach is to maximize the 
overall system performance. MAB is a machine learning 
technique based on an analogy with the traditional slot 
machine (one armed bandit) 13 . When pulled at time tn, 
each machine/player provides a reward. The objective is to 
maximize the collected reward through iterative pulls, i.e. 
learning iterations. The player selects its actions based on 
a decision function reflecting the well-known exploration- 
exploitation trade-off in learning algorithms. 

The set of players, actions and the utility function for our 
MAB based MM approach is defined as follows: 

• Players: Macro BSs M = M} and pico BSs 

iP = P}. 

• Actions: Ak = {Pk} with Pm = [0,3,6] dB and 
Pp = [0,3,6,9,12,15,18] dB being the CRE bias. We 
consider higher bias values for picocells due to their low 
transmit power. The considered bias values rely partially 
on the assumptions in lITOl and at the same time extensive 
simulation results. 

• Strategy: 

1) Every BS learns its optimum CRE bias value on a 
long-term basis considering its load: 




R 

E E ^i{k),r{p'n) ' Pi{k) • 

i(k)€Uk r=l 

( 11 ) 


This is inter-related with the handover triggering by 
defining the cell border of each cell, 

2) A UE is handed over to BS k if it fulfills the 
condition (O. 

3) RB based scheduling is performed based on equa¬ 
tion ((3l. 


Utility Function: The utility function in MAB learning is 
a decision function composed by an exploitation term rep¬ 
resented by player fc’s total rate and exploration part con¬ 
sidering the number of times an action has been selected 
so far. Player k selects its action aj(k){tn) € Ak at time 
tn through maximizing a decision function dk,a^^k) (in), 
which is defined as: 






(tn) — (tn) + 




!log (E[ti 'rik,a,(k) 


(tn)) 


rik,, 




(tn) 


( 12 ) 


whereby Uk^a^^k) (tn) is the mean reward of player k at 
time tn for action aj(fc), nk^aj^k)(tn) is the number of 
times action aj(^k) has been selected by player k until 
time tn, and j • j represents the cardinality. 

During the first tn = 1-4^] • Pg player k selects each action 
once in a random order to initialize the learning process by 
receiving a reward for each action. For the following itera¬ 
tions tn > 1.4fej ■ Tg action selection is performed according 
to Algorithm [T] In each learning iteration the action aj(fc) 
that maximizes the decision function in (IT^ is selected. Then 
the parameters are updated, whereby the following notation is 
used: Sk,aj^k)(tn) is the cumulated reward of player k after 


Algorithm 1 MAB based mobility management algorithm. 


1 : 

2 : 

3: 

4: 

5: 

6 : 

7: 

8 : 

9: 

10 : 

11 : 

12 : 


for tn do 

for * = 1 : jAfc] do 


Select action according: 

SXfe) = argmax„^j^jg|^.| (fn)) 

Update parameters according to: 

Update the cumulated reward when player k selects 


action aj(^k) 
Sk,aj(k) (tn + Ts) 
(tn + Ts) 

(tn + Ts) 

end for 


= (tn) + li=j 


Pk,tot(tri 


tn, 


■Ts 


end for 


playing action aj(fc) and is equal to 1 if i = j and zero 
otherwise. 

B. Satisfaction Based Learning Approach 

Satisfaction based learning approaches guarantee to satisfy 
the players in a system CD. Here, we consider the player 
to be satisfied if its cell reaches a certain minimum level 
of total rate and if at least 90% of the UEs in the cell 
obtain a certain average rate. The rationale behind considering 
these satisfaction conditions is to guarantee each single UE’s 
minimum rate while at the same time improving the total rate 
of the cell. 

To enable a fair comparison, the set of players and the 
corresponding set of actions in the proposed satisfaction based 
MM approach are the same as in the MAB based MM 
approach. The utility function of player k at time is defined 
as the load according to equation (fTTli . In the satisfaction based 
learning approach, the actions are selected according to a 
probability distribution TVk(tn) = [TTk,i(tn), ■ ■ ■ ,T^k,\Ak\(tn)]- 
Hereby, 'Kk,j(tn) is the probability with which BS k chooses 
its action aj(k)(tn) at time f„. The following learning steps 
are performed in each learning iteration: 

1) In the first learning iteration = 1 the probability of 
each action is equal and an action is selected randomly. 

2) In the following learning iterations the player 

changes its action selection strategy only if the received 
utility does not satisfy the cell, i.e. if the satisfaction 
condition is not fulfilled. 

3) If the satisfaction condition is not fulfilled, the player k 
selects its action ajf^k)(tn) according to the probability 
distribution ■Kk(tn)- 

4) Each player k receives a reward (j)k,toi(tn) based on the 
selected actions. 

5) The probability 'Kk,j(tn) of action aj(^k)(tn) is updated 
according to the linear reward-inaction scheme: 

'^kj(tn) — '^kj(tn Ts) -f A ■ hk(tn)‘ 

^laj(fe)(t„)=ai(fe)(t„) ~ '^k,j(tn ~ ^s) 

whereby = 1 for the selected action 

and zero for the non-selected actions and bk(tn) is 
defined as follows: 
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Fig. 2: CDF of the UE throughput for 30 UEs and 1 pico BS 
per macrocell and TTT = 480 ms. 


7 /, \ Uk,max + (t^k ,tot(^n) 

Ok\tn) = -^ 

^ ' ^fc.max 

with u/c,max being the maximum rate in case of single- 
UE and Mfc,min = \ ■ 'Wfe,tnax- Hereby, A = loo-ti+r^ 
the learning rate. 

IV. Simulation Results 

The scenario used in the system-level simulations is based 
on configuration #4b HetNet scenario in na. Simulations 
are performed with the picocell deployment based modified 
version of the system level simulator presented in ifTJI . We 
consider a macrocell consisting of three sectors, an inter-side 
distance of 500 m, and P = {1,2,3} pico BSs per macro 
sector, randomly distributed within the macrocellular environ¬ 
ment. In each macro sector, U = 30 mobile UEs are randomly 
dropped within a 60 m radius of each pico BS. The rationale 
behind dropping all UEs around pico BSs is to obtain a large 
number of handover within a short time in order to avoid large 
computation times due to the complexity of our system level 
simulations. Each UE i{k) has a randomly selected velocity 
Ui(fe) of V = {3; 30; 60; 120} km/h and a random direction 
of movement within [0; 27r], so that both macro-to-pico and 
pico-to-macro handover may occur. We consider fast-fading 
and shadowing effects in our simulations that are based on 
3GPP assumptions na. To compare our results with other 
approaches we consider a baseline MM approach as defined in 
0. The UE performs RSRP measurements over one subframe 
every 40 ms and reports this value. The Layer 1 filtering 
averages the reported RSRP values every 200 ms to filter 
out fast fading effects. This value is further averaged through 
afirst-order ifinite impulse response (IIR-)filter which is known 
as a Layer 3 filter. A handover is then triggered if the Layer 3 
filtered handover measurement meets the handover event entry 
condition in 0. A UE is handed over to its target cell after 
TTT. Eor the baseline MM approach, we consider proportional 
fair based scheduling, with no information exchange between 
macro and pico BSs. This baseline approach is referred to as 
classical HO approach. 



Number of PBSs per macrocell 


Eig. 3; Sum-rate vs. number of pico BSs per macrocell and 
TTT = 480 ms. 


Eig. |2] depicts the cumulative distribution function (CDE) 
of the UE throughput for the classical, MAB and satisfaction 
based MM approaches. Compared to the classical approach, 
MAB and satisfaction based approaches lead to an improve¬ 
ment of 43% and 75% in average (50-th %), respectively. 
Hence, the satisfaction based approach outperforms the other 
MM approaches in terms of average UE throughput. In case 
of the cell-center UE throughput, which is defined as the 
95-th % throughput, the opposite behavior is obtained. In 
this case an improvement of 124% and 80% is achieved 
for the MAB and satisfaction based approaches, respectively. 
The reason is that the satisfaction based MM approach only 
aims at satisfying the network in terms of rate and does not 
update its learning strategy once satisfaction is achieved. The 
MAB based approach on the other hand aims at maximizing 
the network performance, which is reflected in the improved 
cell-center UE throughput. The gains of the proposed MM 
approaches are also reflected in the cell-edge UE throughput, 
which is zoomed in Eig. |3 Here, the MAB and satisfaction 
based approaches yield 39% and 80% improvement compared 
to the classical approach. 

To compare the performance of the proposed approaches 
for different number of picocells per macrocell, Eig. [3] plots 
the sum-rate versus number of pico BSs per macrocell. Eor 
different number of pico BSs the proposed MM approaches 
yield gains of around 70%-80 % for TTT = 480 ms. In 
Eig. m the performance of the sum-rate versus UE density per 
macrocell is depicted for TTT = 40 ms and TTT = 480 ms. In 
both cases, the classical approach yields very low rates, while 
the proposed approaches lead to significant improvement of up 
to 81 % for TTT = 40 ms and 85 % for TTT = 480 ms and 
converge to a significantly larger sum-rate than the classical 
approach. 

Besides the gains in terms of rate, our proposed learning 
based approaches yield also improvements in terms of HOE 
probability as depicted in Eig. |5] Eor the HOE performance 
evaluation, we modify our simulation settings by setting 
the same velocity for each UE. Compared to the classical 
MM approach, the proposed methods yield the same HOE 
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Fig. 4: Sum-rate vs. number of UEs per macrocell with 1 pico Fig. 6: PP probability for 30 UEs and 1 pico BS per macrocell 
BS and TXT = 40 ms and TXT = 480 ms. and XXX = 40 ms and XXX = 480 ms. 



Eig. 5; HOE and ping pong probability for 30 UEs and 1 pico 
BS per macrocell and XXX = 40 ms and XXX = 480 ms. 


probability for UEs at 3 km/h speed. Eor higher velocities in 
which more HOE is expected, the HOE probability obtained 
by the proposed approaches is significantly lower than in case 
of classical MM. 

Xhe PP probability is depicted in Pig.|6l Eor XXX = 40 ms, 
all MM methods yield very similar PP probabilities for 
lower velocities while this probability is decreased for higher 
velocities. Xhis slope is aligned with the results presented 
in 0. However, for high velocity UEs, the PP probability 
of the proposed MM approaches is half of the PP proba¬ 
bility obtained for the classical MM approach which shows 
a significant improvement. Xhe rationale behind this is that 
both tiers perform CRE for load balancing, i.e. if one cell 
tries to extend its coverage/handover a UE the other cell may 
prevent this handover by extending its coverage, too. In case 
of XXX = 480 ms almost no PPs are observed. 

V. Conclusion 

We propose two learning based MM approaches and a his¬ 
tory based context-aware scheduling method for HetNets. Xhe 
first learning approach is based on MAB methods and aims 


at system performance maximization. Xhe second learning 
method aims at satisfying each cell and each UE of a cell 
based on satisfaction based learning. System level simulations 
demonstrate the performance enhancement of the proposed 
approaches compared to the classical MM method. While up 
to 80% gains are achieved in average for UE throughput, the 
HOE probability is reduced up to a factor of three by the 
proposed learning based MM approaches. 
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